188
14
The Nature of Living Things
Table 14.2 will reveal some striking instances—the genomes of amoebae and lungfish
considerably exceeding in size those of ourselves, for example.
Before delving into this question more deeply, three relatively trivial factors affect-
ing the C-value should be pointed out. The first is experimental uncertainty, and
ambiguity in the precise definition of the C-value. Second, in some cases, genome
size is merely estimated from the total mass of DNA in a cell. This makes the given
value highly dependent on polyploidy, unusual in mammals but not in amphibians
and fish, and rather common in plants. For example, the lungfish, which has a con-
spicuously large C-value, is known to be tetraploid. Amoebae, which apparently
have an even larger C-value, are likely to be polyploid and, moreover, the amount
of DNA found in an amoeba cell may well be inflated by the remains of genetic
material of recently ingested prey. Care should therefore be taken to ascertain the
amount of genetic material corresponding to the haploid genome for the purposes
of comparison. The third factor is the presence of enormous quantities of repetitive
DNA in many eukaryotic genomes. These repetitive sequences include retrotrans-
posons, vestiges of retroviruses, and so forth. Probably about half of the human
genome can be accounted for in this way, and it seems not unreasonable to consider
this as “junk” (although it appears to play a rôle in the condensation of the DNA into
heterochromatin; see Sect. 14.4.4). 26
Is There a G-Value Paradox?
By correcting for polyploidy and repetitive junk, one arrives at the quantities of
DNA involved in protein synthesis (both the genes themselves and the regulatory
overhead). In some cases, the actual number of genes (the G-value) can be estimated
with reasonable confidence; in other cases, the simple application of a compression
algorithm (Sect. 7.4) can be used to provide a minimal description (an approximation
to the algorithmic information content; see Chap. 11), which correlates much better
with presumed organismal complexity (as measured, for example, by the number of
different cell types). Where gene number estimates are available, however, the more
complex organisms do not seem to have enough genes. Especially if the figure for H.
sapiens has to be revised downward to a mere 20 000, we end up with fewer genes
than A. thaliana, for example! This is the so-called G-value paradox. Its resolution
would appear to lie with enhanced alternative splicing possibilities for more complex
organisms. We humans appear to have the largest intron sizes, for example. 27
26 Regarding the remainder, about 5% is considered to be conserved (by comparison with the
mouse); 1.2% is estimated to be used for coding proteins, and the remaining 3.8% is referred to as
“noncoding”, although conservation of sequence is taken to imply a significant function (it seems
very probable that this “noncoding” DNA is used to encode the small interfering RNA used to
supplement protein-based transcription factors as regulatory elements). That still leaves the enigma
of the remaining 40–50% that is neither repetitive nor coding in any sense understood at present.
27 Taft et al. (1992). Note the connexion between alternative splicing and Tonegawa’s mechanism
for generating B-cell lymphocyte (and hence antibody) diversity in the immune system (Sect. 14.6).